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ABSTRACT 

The Evaluation and Development Blow Your Mind 
Conference was one activity initiated under the design grant for the 
Colorado Center for Training in Educational Evaluation and 
Development (at the University of Colorado) . Prior to the conference, 
3 consultants (a psychology professor from the University of 
Washington, a representative of the Educational Testing Service, and 
the program director for the Bureau of Applied Research at Columbia 
University) spent 1 1/2 days evaluating the past and potential 
performance of the University of Colorado's Laboratory of Educational 
Research as a training facility in research and research- related 
areas. At the conference, these consultants were asked to 
f ree-associato and brainstorm on what evaluation, research, and 
development in education should be. This document is the lightly 
edited verbatim proceedings of the consultants' input and 
interactions with other participants. (GC) 
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INTRODUCTION 



On Thursday afternoon, Deoeirber 10, 1970, the Evaluation and 
Development Blew your Mind Conference was held at the Red Lion Inn, 
just east of Boulder, Colorado. The Conference served as one 
activity among several initiated under the design grant for the 
Colorado Center for Training in Educational Evaluation and Development. 
The Conference was coordinated by Dr. William L. Goodwin, the Design 
Project Director, and was attended by a hearty (considering the 
inclement weather) group of 26 persons representing a variety of 
organizations, as can be noted below: 

1) Biological Sciences Curriculum Study, Boulder, Colorado: 

Dr. James T. Robinson, Consultant. 

2) Colorado Department of Education, Denver, Colorado: 

Dr. Arthur R. Olson, Director, Assessment Evaluation Unit. 

3) Denver Public Schools, Denver, Colorado: 

Mr. Barry Beal, Supervisor. 

Dr . Jerry Elledge , Supervisor . 

4) Earth Seienoe Educational Program, Boulder, Colorado: 

Mr. Larry Irwin, Associate. 

Dr. william D. Romey, Director. 

5) Interstate Education Resources Service Center, Salt Lake 

City, Utah: 

Dr. Brent Gubler, Director. 

6) John F. Kejmedy Child Development Center, University of 

Colorado Medical School, Denver, Colorado: 

Lila R. Wegener, Intem-Trainee . 

7) Social Sciences Education Consortium, Boulder, Colorado: 

Dr. Irving Morrissett, Director. 

8) Southwest Cooperative Educational Laboratory, Albuquerque, 

New Mexico: 

Dr. James C. Moore. 

9) Southwest Regional Educational Laboratory, Inglewood, 

California: 

Dr. Mos Okada. 
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10) University of Colorado, Boulder, Colorado: 

Fellcws and Students of the Laboratory of Educational Rsseardi 
Beverly Anderson 
Richard Bennet 
Evelyn Brzezinski 
Nancy Burton 
Arlen Gullickson 
Norris Harms 
Larry Nelson 
Susan Oldefendt 
Rory Remer 
Todd Rogers 

Faculty 

Dr. Thomas Bar lew, Associate Dean of the Denver 
Center School of Education. 

Dr. Gene V Glass, Associate Professor and Design 
Project Staff Meirber. 

Dr. William L. Goodwin, Associate Professor and 
Design Project Director. 

Dr. Kenneth D. Hopkins, Professor and Design 
Project Staff Menber. 

Dr. Gerald W. Lundquist, Assistant Professor. 

Playing principal roles at the Conference were three consultants: 

1) Dr. Arthur Lumsdaine, Professor of Psychology, University of Washington. 

2) Dr. Sam Messick, Educational Testing Service. 

3) Dr. Sam Sieber, Program Director, Bureau of Applied Research , 

Colurrbia University. 

These men spent Wednesday, Decerrber 9, and Thursday iroming, Deoenber 10, 
evaluating the past and potential performance of the Laboratory of 
Educational Research as a training facility in research and research- 
related areas. On Thursday afternoon, the consultants were asked to 
free associate and brainstorm on what evaluation and development in 
education should be like and, consequently, on what training experiences 
evaluators and developers should undergo. 
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What transpired is recorded , for the most part verbatim, 
herein. The consultants were sent transcripts of the Conference and 
asked to edit sparingly (in order that the spontaneity and original 
flavor of the Conference thereby mi^it be preserved) . It is obvious 
that if the consultants were writing on (rather than discussing) 
the same subjects, their products would be somewhat more polished 
and organized. Still, the contributions made by each of them seemed 
perceptive and noteworthy, and are presented here as inputs to be 
considered in this general area. 
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THE EVALUATION AND DEVELOPMENT 
BLCW YOUR MIND CONFERENCE 



This is really a very unstructured assignment, 
althoucfa it has sore precedents, i think that among us 
there have been sane past expressions of disagreement 
with respect to the relationship between research and 
evaluation and their relative importance. I guess I 
should identify myself as a heretic and, furthermore, a 
renegade because I was raised as a basic researcher. 
Starting with conditioning under Jack Hilgard at Stanford, 
I gradually progressed through a series of educational 
research studies in the classical hypothesis-testing 
paradigm. I ha\re arrired at a point where I seriously 
question whether the most fruitful way to proceed, in 
terms of improving education, is in fact through the 
classical (if I can use that phrase) model of nineteenth 
century physics : one goes into the laboratory or 

equivalent and does basic research; from that derives 
implications, which are supposed to say something about 
what educational practice should be; next one does 
dissemination or diffusion or something like that and 
these basic principles somehow translate themselves into 
practice . 

I'm sure that this happens to some extent, but I 
guess that the position that I would take (at least for 
the sake of argument and really a little bit more than 
just for the sake of argument) is that the educational 
research and development dollar could well have a larger 
portion of itself expended on something more like an 
engineering model than a scienoe model. That is to say, 
a development, evaluation, and test model, oriented around 
the development of educational products or programs , the 
empirical testing of them, the use of data to improve them, 
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and the arrival at general findings arising in the context 
of this quite frankly applied, engineering-type development. 

I hope that the term "engineering" doesn't turn you 
off; what I really mean is the attenpt to develop and 
inprove (thrombi research and applied- research techniques) 
a useful educational product with concern always, of 
course , as to the extent to which what is found in one 
situation can be applied next door, next year, across the 
country in a slightly different context, and so on. 

Well , we have had several spirited interchanges , particularly 
Sam Messick and I and a few of our other colleagues, on 
this general preposition; and I think that maybe the 
stage oould be set (entertainingly if not usefully) 
for some further discussion by asking Sam to identify 
some of the points of disagreement that he perceives. 

Okay. It's a little bit difficult to knew how to 
proceed at this point, but let me make a few general 
remarks, first by starting out specifically, then becoming 
more general, and then, hopefully, coming back to the 
more specific again. Specifically, I would like to take 
exception to the engineering paradigm as a way to proceed 
in educational eval uat ion and development , being deliberately 
a little bit unfair to Art in this position and setting 
up somewhat of a straw man. However, I'm also setting 
up a straw man because I'm concerned that the adoption 
of an engineering paradigm in educational development 
and evaluation ( although technically it is feasible for 
us to worry about generalizability of the findings next 
door and next year) would make such generalization unlikely. 
Rather, I feel that we should be adopting a paradigm that 
puts much more enphasis on process, that is, concern not 
with just assessing the size of effects and finding them 
good or bad, but oonaem with understanding the processes 
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which produced those effects. The engineering paradigm 
in its sinpliest form is essentially concerned with 
input -output differences relative to cost; and I would 
argue that that's not enough . We have to be conoemed 
with the context in which the output differences occur, 
with the processes that produce the outputs, and, notably, 
with the antecedents and consequences of the total enter- 
prise . 

Let me become a little bit more general at 'this point. 
If we ask what educational development should be and 
what educational evaluation should be, i.t seems to me that 
we can 1 1 proceed very far in answering these questions 
without asking, "For what?" Educational developnent 
for what? Educational evaluation of what? Very early in 
the game we have to worry about the goals of the educational 
programs. That is, educational development and educational 
evaluation cannot be considered separately from educational 
programs and their specific content. Educational programs 
cannot be considered separately from the goals or values 
of education. I argue strongly that none of these things 
can be considered separately from educational research . 

I argue that eval uat ion of educational programs ought 
to be research on educational process and nothing less . 

And to a very large degree, educational development should 
be the same thing. 

At Educational Testing Service, we have several 
divisions with somewhat differently articulated missions. 

We have a division that is called the Division of Ttest 
Development; I'm sure you all knew from it's name what 
it's supposed to do. What is test development? What 
we consider test development to be might give us seme 
lessons as to what educational development might be or 
mean. At Educational Testing Service we also have a 
Division of Developmental Research; it's not so clear 




s 



what this division is supposed to do. In our Developmental 
Research Division, the concern is to try to understand 
the dimensions of a problem and the variables that are 
influencing an educational problem in order to develop 
options for solving that problem and then evaluating 
those options with respect to their relative effectiveness. 
Thus, the task is trying to understand the nature of the 
variables and doing the hard intellectual work that's 
involved in construct validity research in order to 
develop test specifications of the processes to be assessed, 
the specific content tc be assessed, and whatever other 
dimensions are involved. Onae those specifications are 
clearly in mind, and maybe a prototype item or two has 
been developed, then the Test Development Division cones 
into play. The Test Development Division, then, is a 
group of subject matter experts that develops items to 
meet fixed specifications. If you don't know what the 
specifications ought to be, then that's not development 
in our particular ethnocentric view of the problem. 

As we consider the process of educational development, 

I think we have to worry about what wa mean by that process . 

Is it the generation of products to meet specifications 
that are well understood or is it an attenpt to understand 
what those specifications ought to be? I would argue that 
this stage of development, we have to understand what 
the specifications ought to be. At this stage our concern 
is with educational evaluation; we have to understand how 
effective the approaches are and understand the processes 
that produce those effects. I view both of those enterprises, 
both development (as I've talked about it) and developmental 
research and evaluation, as evaluative research. From rty 
point of view, we should maximize the overlap between 
evaluation and research and between development and 
research. 
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'This is not to inply that other approaches to 
development might not be required because of the press 
circumstances; that is, we might have a problem 
facing us that must be solved imnediately . For example, 
the social fabric of the schools is disintegrating; 
we can't take five years out to engage in a research 
program to understand hew to best proceed in the future. 
This problem and others have to be addressed now. One 
consequence of this type of pressure is that we mi^it 
develop programs which are not researched, programs where 
we don't pay any attention to understanding the basis 
for their operation. 

By the way, there's another subtle political pressure 
that is operating presently in Washington . Ihis pressure 
is a call for research on the excuse that we don't know 
enough to act. Just two years ago the pressure wets 
exactly the opposite; then it was a call for action, 
labeling research as a frill. But now it is a call for 
research because the knowledge base isn ' t substantial 
enouefr to indicate how to act. The reason for that switch 
is very clear to me; I may not be right, but I think I 
perceive it clearly. The reason now that there is a call 
for research is that research is really very inexpensive . 
You can do a lot of research for $300,000; you can't do 
much action. So the call for researdi is essentially an 
excuse for not supporting inport ant action programs. 

But I don't see this as a dichotomy; I don't see 
why we have to talk about a pendulum swing. There is a 
viable alternative, that which I called evaluative research 
(and that Kurt Lswin 30 years ago talked about as action 
research) . This strategy is used when, faced with pressing 
social problems which must be addressed now even given 
all of our ignorance on these issues, we develop action 
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programs based ipon the best available ideas and knowledge, 
and we do it new. We attenpt to irrpleirent those programs 
innediately— we don't wait for five years of research. 
Additionally, we build into the implementation of those 
programs provision for collecting information, relevant 
information as to hew the program is operating, information 
about whether or not effects are being produced, the size 
of the effects and information about the processes that 
produce the effects. That makes the action program an 
action research program. I would argue that action 
programs without that research component are a waste 
of time because it is unlike ly that we will understand 
enough about unresearched action programs to generalize 
therefrom to other settings, to other individuals, or to 
other times. That's a fairly drastic way to present the 
case but nonetheless I think that a kind of (if you will 
pardon the egression) "black-white thinking" would 
warrant it. 

let me just state my position on this very briefly; 

I think Art wants to rebut. I agree with Sam that 
evaluation should really be locking at the process, the 
procedures by which you get the outcomes. Otherwise, 
you cannot generalize the results of the evaluation to 
other settings, to other types of students, to other what 
have you. In that respect, I think I disagree with Art 
on the enphasis on the engineering-model product evaluation. 
But then I want to carry that indication further; What 
does that do to our research design? If we have to lode 
at every possible condition, constraint, facilitating 
situation, and procedure involved in every kind of prdgram 
or educational practice (and being a sociologist I 'm more 
concerned with the evaluation of larger scale programs , 
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organizational changes , etc . , rather than particular 
little changes in curriculum, in a certain grade, for 
example) , what does this do to our emphasis on quantitative 
experimental design? If your evaluation considers processes, 
you have to look at everything. You have to take into 
account the SES background of students (maybe even of the 
teacher) , the teacher's personality, teacher enthusiasm 
for the new program that you are inserting, how long they 

been in that school system, the school system's 
innovative climate, the principal's backing up of the 
program, the amount of pressure on the school to perform 
well in this new program, etc. All of these things 
might be contributing to that outcome you are getting. 

You are not going to be able to design an experiment which 
is sufficiently elaborate unless you have most of the 
schools in this country, I think, to control for all of 
these procedures, constraints, opportunities, and so 
forth, in that situation. So I am wondering if this will 
necessitate a shift to more qualitative types of observation , 
say inpress ions, rather than attempting to quantify the 
outocme of every variable in that situation that you're 
interested in studying. 

That's a very good point. Essentially you are saying 
that in education, as ve consider the effectiveness of 
a particular program or product, we have to recognize 
that it s occurring in a context which is essentially a 
system. It is a wry complicated system be caus e it is 
a system that is addressed essentially to the whole problem 
of human development. We recognize that educational 
growth and the understanding of educational growth can't 
really be separated from the understanding of human growth. 
The system we are de a l ing with in education embraces not 
only the child, his teacher, and the program, but also the 
home, the school, the peer group, and the camiunity. 
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In order to deal with that problem properly, we have to 
address the possibility of interactions occurring and 
that means we nust utilize multiple measurements. There 
is no alternative to that as I see it. The alternative 
of simplifying reality by controlling variation is one 
possibility (of going into the laboratory and making 
simplified but hence artificial situations that hopefully 
will help us understand part-prcoesses ) , but not a 
particularly good one. That is, we may understand 
part-processes generated in the lab which, when we try 
to generalize to the real world, are found not to generalize 
very well. Thus cne consequence of the point of view 
being advanced is that educational evaluation and educational 
research must be multivariate and must be interactive. 

I would also argue that it must be longitudinal and 
ocrrparative and that if the research enterprise is not 
longitudinal (in the sense of having multiple measures 
over time), ws can't understand the processes. Research 
which is multivariate, interactive, and longitudinal is 
very complicated and must involve vary complicated designs 
and very complicated multivariate analysis . 

I think we're speaking of designs that we really haven't 
thought through yet . . . 

Right. 

Just haven't arrived at yet. "Strategies", rather than 
"designs", may be a better term. 

I want to go back to the remark before last before last. 

The ante-penultimate remark. 

I thought that I was going to pick a fight with Messick and 
I foirtd myself discouragingly in agreement, from one point 
of view, with what he said. I guess that maybe I never should 
have used the term "engineering" and maybe sate of those 
polysyllabic adjectives; this kind of term, if we aren't 
careful, may do more to obfuscate than to clarify. Since 
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there isn’t tine for jokes, I thought that I might read a 
little thing that a friend of mine handed me the other day; 
some of you may have seen it. It says in large type, 

"No wonder we don't communicate " and under this in smaller 
type it says, "You say you understand what you think I said. 
What you don't realize is that what you heard is not what 
I meant . " 

When I talked about an engineering model, I was really 
saying , or trying to say , ■very much what you subsequently 
said was what we ought to do. That is to say, you're 
saying that the press of events, the need for educational 
improvement and innovation is such that we can't possibly 
do all of the background research necessary. Rather, we 
have to, in fact, take action; that is Point 1. And I 
was really trying to say that same thing. I despair of 
trying to do all of the fundamental research and then 
deriving from that the principles necessary to make the 
decisions on what kind of action programs we are going to 
engage in to meet pressing educational , societal problems . 

So , we ' re in agreement on that ; that seme action , some 
development , is needed with at least comparable, and I think 
maybe considerably higher, priority than the more traditional 
on and on and on with fundamental research (out of which we 
trust, perhaps rightly , that ultimately some good will acme) . 

But then the next thing that you said, seemed to me, 
was that just development , just innovation, just new educational 
products or new gimmicks (if you want to use a pejorative 
term) or just new methods or approaches aren't going to help 
much, aren't going to lead us anywhere very much, unless we 
ocxrbine that with "action research" (I prefer to call it 
"evaluative research") . That is to say, evaluative research 
which is frankly applied in its orientation, which seeks 
to determine what a particular product or program accomplished, 
and hov that oenpared v/ith sate reasonable alternative which 
has enough stability that we can define it, so that we know 
what we are comparing. 
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This is exactly what I meant by an engineering approach. 

I meant to say, first let us see what the problems are 
(whether they're those of elementary reading or of getting 
a more intellegient non— bonb-thrcwing type of participation 
by college students in the political process or whatever 
they might be) . let us try to devise and irtplement programs 
that seem to meet the needs that face us, and then let us 
address a very substantial amount of our research effort 
to determining whether in fact the kinds of outcomes that 
we hoped these products would bring about are, in fact, 
broucjit about . However, we then need bo go one step further 
because history never repeats itself exactly. What we find 
out about a particular product designed or a particular 
program implemented in Peoria in 1969 doesn't necessarily 
give us a completely secure basis for claiming that we can 
transplant the same thing to St. Paul or Austin in 1971. 

Well* it seemed to me, though, that you were saying that 
the way we get this assurance of reproducibility (that 
isn't the term you used but it's the one that I'll use) , 
of exportability, is in terms of the evaluation of that 
Particular approach. (I mean certainly to include hard-data 
evaluation to assess the extent to which outcomes demonstrably 

4 

are realized not just whether the new product or program 
sort of looks nice.) Hew do we get this assurance that 
what we find has been acocnplished by a particular program, 
reading program, mathematics program, or what have you, in 
a partic ula r place can be depended on to produce similar 
effects in sate future situation? What I further heard you 
saying (and maybe I was reading too much between the lines) 
was that one of the ways that we do this is to look at the 
process, the fundamental variables that seem to be involved, 
the many respects in which a new situation can differ from 
the original situation, and try to see if we have a theoretical 
basis for determining whether the proposed generalization to 
a new situation probably is secure or whether it is risky. 
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I think, from ny particular biases gleaned from the 
kinds of things I was doing for a fair number of years , 

I would say there is another basis for reproducibility, 
and that is the extent to which the method is embodied in 
a concrete set of materials and procedures which are 
physically reproducible and exportable. If you take the 
so called "programmed" materials, or film programs , or 
video-taped programs , or computer programs, they suggest 
programs that are embodied to a considerable extent in things 
which are inherently physically reproducible in terms of 
educational media, then that is another basis (not the only 
basis but another basis) for trying to assure reproducibility. 
At least you have greater assurance that the educational 
stimulus will be the same in the new context onoe the program 
is transplanted to a different state or a different school 
system, than you would have if the basis for similiarity 
and reproducibility is only some kind of "method" stated in 
abstract terms that is diffused by a type of multi-stage 
diffusion process such as teaching it to teachers of teachers 
of teachers. 

So I think this is another basis for exportability which 
needs to be given serious attention. In terms of the develop- 
ment and ev aluat ion paradigm, the development of specific 
products of this kind has a concrete ness (that again suggests 
what I thought I was trying to imply by the engineering 
model) that just the diffusion of methods or the dissemination 
of principles of methods does not have. 

I think the danger here is in taking you too literally 
on the emphasis of products; that is, there are really two 
problems that we're concerned with. One is the problem of 
generali zability or reproducibility that Art has emphasized. 
The other one is the problem of interpre tabi lity . It seems 
to me that we must be conaemed with understanding the 
basis from which the effect is produced and attributing 
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that effect to an appropriate kind of treatment variable. 

In the current engineering model, enphasis is on how you 
get to the output from the input (that is, the outcome-input 
differences) and (as stimulated by Washington) on cost- 
effectiveness; we have to consider output relative to cost. 

Rather than the traditional engineering model, I would 
prefer to see used a model which is really a kind of a way 
of thinking, a way of errphasizing certain problems, a model 
which Micheal Scriven alluded to . . . just kind of tossing 
it off in a sentence, in an article . . . saying that really 
a better model for educational research (for me that neans 
educational evaluation and educational development) is a 
medical model. That is, a model concerned not with just 
input-output differences (but one that is concerned with 
those too, to be sure) , but also with such things as side- 
effects. The engineering model, to me, faces the danger of 
errphasizing ocnpletely intended outcomes and then evaluating 
effectiveness in terms of the extent to which intended 
outcomes are met without being much concerned with unintended 
outcomes or side-effects. The medical model, it seems to 
me, is a much better model to guide our thinking, say, in 
the evaluation of drugs . It is just not enough to evaluate 
a drug by saying it produces the intended effect. Here's 
a drug that was supposed to reduce blood pressure and it does . 

If it also disintegrates the liver in the process, it's not 
acceptable. So we must be oonoemed with side-effects in 
that regard. 

I think it should be clear that there is controversy 
over this point: that there are people who will take the 

other position and say "no". We, Art and I, were at a conference 
together a couple of years ago in which one of the participants 
essentially stood up and said "no" . He indicated that he 
was a menber of the butter-wrapping school of evaluation. 
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By that he meant that if you want to teach a person how to 
wrap butter, then only one thing is inport ant in evaluating 
the training program, and that is bo assess how much butter 
trainees wrap, and that's all. It's a very, very sirrple 
way to evaluate the effectiveness of the training program 
and the educational procedure used. Now I think it is easy 
to counter that approach. One way bo counter it is to say, 
suppose there is a butter-wrapiper who is the most efficient 
in the group, but who has gained his efficiency because 
of a particular process he engages in, a stylistic quirk, 
as it were. That is, he's very rapid because he touches his 
thumb to his tongue as he picks up the wax paper. Now you 
may not think that that is an acceptable mocte of butter- 
wrapping, given hygienic standards, but he produces nore 
wrapped butter than anyone else. There's a very inportant 
lesson to learn from that . . . what it inplies is that you 
may not be able to evaluate the desirability of the outcome 
without knowing and understanding the process which produced 
it. To understand the process which produced the outoome 
is an enterprise in educational research which I see as 
paramount in the medical model approach I suggested here, 
and considerations of evalu a tion and development are part 
of that model's research process. 

I knew this was going to be a rap session, but I didn't 
knew it was going to be a butter-wrapping session. 

Now what paradigm do you use to evaluate rese ar ch? 

I think that's really what I hear all of you struggling for. 

This raises another issue. 

But isn't that basic? 

Yes, it's an issue that we're going to get to. I think 
it might be well to raise it now. If you want to say what 
educational evalu a t i on ought to be, you first have to 
determine what kind of things you are going to evaluate; 
then I'll tell you, maybe, what it ought to be. And if you 
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ask what is educational development , what should it be; 
well, again, the question becomes what kinds of things 
do you want to develop? Hew do you answer that question? 

What do you want to develop in education? Very quickly 
we get to the Whole problem of the goals of education. 

What is it you want to teach young children in the first 
grade? Or consider a problem that many people are discussing 
currently , what should be the content of preschool education 
programs? What should be the goals of preschool education? 
What should we teach the very young child prior to his 
entering the formal school system? And how did we ever 
decide, in the first place, what to teach young children? 

It's very clear to me that the answer to that question 
is two-fold ... that there are really two issues just as 
there are two when we are asked, as scientists, to make 
a reooimendation for practice, in the area of measurement, 
for exanple, if sane one asks you whether they should use a 
P ar ^ cu ^- ar test for a certain purpose, there are always 
two issues involved. First, is the test any good as a treasure 
for what it's sipposed to measure? The second issue is; 
should it be used for the intended purpose? veil, the sane 
is true in the educational realm, more broadly. Is the 
procedure any good for bringing about the effects that it's 
supposed to bring about and should it be used to bring about 
those effects? Ihe first question is a scientific one. In 
the measurement area, we have standards, psy chore trie criteria, 
for evaluating the adequacy of a measure, the nost inport ant 
of which is construct validity. The second question is an 
ethical one. It can only be answered in terms of the social 
consequences of applying the technique, the measure, or the 
educational program, and an evaluation of associated social 
consequences in terms of value systems. 
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The problem that we're in as scientists is that we 
frequently delude ourselves into thinking that answers to 
the second question can be obtained fran answers to the first 
question. It doesn't matter how good a test is cr how 
effective an educational program is per se , for answers to 
the second question; the second question can only be answered 
by appeals to values and ethical evaluation. When we ask 
hew do we evaluate educational research, development , or 
evaluation, it always has to be for what? We are at a loss 
as scientists to answer the "for what" question because 
we really haven't developed many of the techniques that 
would help. 

However, I do agree with Michael Scriven that science 
has a lot to say about answering ethical issues. Scriven 
says (and I think I agree with him, although I'm not sure) 
that ethics is a social science . . . that the nethods of 
enpirical social science can be brought to bear on ethical 
issues . This is particularly true with respect to the 
very critical aspect of ethics which is to evaluate the 
oonsequenoes of alternative approaches . I think, clearly, 
that this is a possibility, but in the long run, the ul titrate 
decisions are going to be made in terms of judgment and in 
terms of value. We have difficulty dealing with that in 
a pluralistic society because there are different values 
and different opinions about what is good for young children. 

Right now we kind of cop-out on the problem of preschool 
education, which was the specific illustration I started with, 
by saying that whatever we give to very young children 
before they enter the formal school system should be good 
for them in the first grade. That is, whatever it is we 
are doing in the first grade, we should do it earlier. But 
that really is a oop-out because we should ask, "why is it 
good for them in first grade?" Is the answer, "because they 
need that kind of preparation for second grads?" And why 
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do they need it in the second grade and so forth; you can see 
what's coming. Ultimately we end by saying that what they need 
in formal schooling is adequate preparation for effective 
adult-role functioning. That means that we have to turn to 
the nature of society and the nature of changes in society 
and the kind of adult-role function for which we would like 
to prepare children; and we are confronted with the fact that 
maybe we're not doing such a good job of that. 

I heard a story just the other day which produced a great 
deal of negative affect in me. It was a story of an Indian 
tribe in Central America which, because of a lot of ecological 
constraints, had stayed in the same place for years and years 
and years, literall y for centuries. Because of mountains that 
enveloped them and other factors, they just remained there. 

But over the centuries , they were subject to a parasitic 
infection which consequently caused blindness as individuals 
aged. So most of the adults in the cannunity, age 35 or 40 
or older, were blind. Hie re grew up in this setting a non- 
formal educational system. It was essentially ge aired to preparing 
the youth of this ccmnunity for the inevitable blindness that 
they would have to face. Many of the young in this country 
feel that that is exactly what the educational system is 
providing for them new; they're telling us that in no uncertain 
terms . New hew do we respond to that? How can educational 
research or development or evaluation respond to that in meaningful 
ways? I 'm not going to answer that question for a very good 
reason . 

I think that we have some sense that maybe some of you 
in the audience would like to ask some questions ; I have 10 
other oomrents that I feel it's essential to make, but I am going 
to suppress all of them. 

I just thought that it might be a way of interconnecting 
several points that have been made to say that evaluation is 
not context-free. Over and over, we keep referring to the 
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context of evaluation in several respects, We were talking about 
the goals of, and the specifications of, evaluation. Sometime 
ago, at the beginning of this presentation, Sam stated that at 
ETS once they have worked through the specifications, they can turn 
the hade work of development over to the developers in that division. 
Specification is the big problem, and I think that it should be a 
big problem for any sophisticated evaluator who goes into the school. 
It can be a big problem in that the practitioner does not know 
what the goals are. You just can't take it for granted that ho 
knows exactly what he wants to get out of a new educational practice 
or program. Even if he does knew, you might spend a lot of tine 
trying to help him articulate and specify those goals and, of 
oourse, it is 15) to you to operationalize them in some way. It 
also mi#it be the case that even if he states specifically what 
his goals are, those really aren't his goals; he's misleading 
y® u or he's misleading himself ... the goal of adopting the program 
mi^it be to iirprove oomnunity support for the school, to resolve 
conflict among school board members over a particular segment of 
the educational program, to get him a promotion . . . you don't know 
what. So the evaluator often should diagnose and go behind the 
problem that is presented to him, even if it appears, on the surface, 
® clear - cut problem. Sometimes the practitioner gives you 
too many goals, and it's inpossible to fulfill all of these goals 
in any kind of program; so you have to make him pare them down. 

Ihe entire process of specifying and articulating goals, defining 
them, and diagnosing them is a very inportant stage in evaluation 
which has only been alluded to, I think, by Sam who talked about 
objectives . 

He also talked about goals, I think, in the sense of policy 
research: should we uniformly acoept the goals? Even if they 

are good goals, rational goals, and would help children learn a 
kind of curriculum, maybe there are more inportant 
gsals that practitioners should have; if we supply them, we are 
inserting our own kind of judgment in the situation. Perhaps we 
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should as cfsnsrdlists in education (as sort of mini -educational 
s ^ ates men , that is, small educational statesman) . Maybe we should 
be thinking about what inner-city sdiools really should be doing 
for children. Rather than trying to develop a program that is 
supposed to teach them to read faster , maybe we should be saying, 

"wipe out the lecture" or introducing an entirely different kind 
of educational system or having kids write poetry or learn history 
by painting or learn hew to read by memorizing scripts for plays . . . 
something like that . . . which may never have occurred to the 
practitioner. So what I am saying is that this is one situation 
in which context is important. What is behind the goals, behind 
articulating the goals, etc.? 

Another place that context comes into play, which we also 
talked about in the very beginning, concerns all the multifarious 
variables, the restrictive conditions in the situation, the 
facilitating factors, that are producing the outoone ... those 
factors that we should knew about if we are going to do process 
analysis and c^neralize and reproduce the outcome or modify it 
for other kinds of settings. 

Another way in which the context comes in, is in the utilization 
of evaluation. I think we are all taking for granted (and necessarily 
just to keep ourselves alive and keep our egos strong) that it 
makes a difference whether we evaluate or not. But I think very 
often, and maybe in a majority of cases, evaluation doesn't make 
any difference . in a certain sense, often our evaluation results, 
no matter hew beautifully developed and presented, just have no 
inpact on any situation whatsoever. The report is filed away sone- 
Pl aoe the USOE or in ERIC, or it never gets to practitioners . . . 

couldn't care less even if it do e s get to them, they haw to 
it in action . . . you knew, a whole multitude of reasons — 
bureaucratic restraints, "it's going to cost too much, etc." — 

30 that very often evaluation leads to nothing, absolutely nothing. 

One reason it doesn't lead to anything is because the evaluator 
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never sits dcwn with the practitioner in the beginning to talk 
about hew the evaluation is going to be used later on, to get 
any guarantees of further testing-out of the results of the 
evaluation, or to determine how feedback will be provided 
to people who will be affected. 

These are scare ways, I think, that we have to broaden our 
conception about evaluation, and they certainly have implications 
for the training of evaluators. 

Sometimes the evaluations are deliberately misused for 
political reasons. 

Sure. 

We as ev alua tors will become pawns in the political process. 

let me interject one dissenting note to this. One way in 
which you can make quite sure that the results of evaluation 
in fact utilized is when we are talking about what Seri van 
refers to as formative evaluation, that is, if we think of 
a developer-evaluator team in which the customer for the 
evaluator is the developer himself. This is not a matter of 
passing judgment on something, of making an external administrative 
decision. Rather, it is the use of evaluative data, say in 
the case of a program on history, to find that certain concepts 
got across pretty well and that certain others didn't. Then 
you use this information (this is a simplistic exanple, but 
will do) to decide what parts of the program you should pay 
attention to in trying to revise and improve it. In fact, 
you virtually guarantee in a limited, perhaps, but still 
I think quite real sense, that the results of evaluation will 
be utilized. Further (again in what is a limited but I 
think important sense) , it is possible to ascertain whether or 
not in fact you did what you were trying to do. There are 
a number of instances that I could cite on such use of 
evaluative data (that is, the use, in revising a program, of 
hard test data on what aspects of the intended, and if possible 
even the uninte n ded, outcomes of an educational, instructional, 
informational, or indoctrinational progr am got across with 
3n early version of it, and which ones did not get across) . 
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In these cases, the data have been fed back in one or mere 
stages of revision in which the revised product has then 
been tested ccnparatively against the original unrevised 
product, and where ir; has been extremely clear that the use 
of evaluati\e data in revising the program has led to a better 
product than one started out with. This has been shown even 
where the original product was cne that already had been 
through considerable development and evaluation of a 
qualitative, non-data-based form. So this is at least one 
important exception to the pessimism about the usefulness 
and use of evaluative data. 

I believe it is also important to consider the role that 
educational researchers and evaluators, and social scientists 
in ^neral, are being called on to play in the political enter- 
prise. That is, many \ery large scale evaluation studies have 
been undertaken in the past few years, primarily to gather 
information that would convince politicians that certain things 
are worth investing money in. Other situations are such that it 
is almost impossibl e to gather relevant information abo ut the 
effectiveness of particular programs because of lack of foresight. 
Title I, for exanple, put millions and millions of dollars into 
the educational ocmronity . Since there wasn ' t any adequate 
pre-test information available, it is very difficult to evaluate, 
on a national scale, the effectiveness of those dollars. Other 
programs that are smaller, like the Headstart Program, have 
been evaluated in a variety of ways. Vbry recently a large 
scale national impact study of Headstart was undertaken by the 
Westinghouse Corporation; the constraints of the study were such 
that people who knew the manner in which the study had to be 
conducted, i.e. , social scientists (by and large uni formally 
I think) predicted that the results had to be negative. That 
is, the results had to come out to make Headstart look harmful, 
or at leas t i n a d equate. Yet the study was undertaken, those 
results were obtained, and they had political consequences. 
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How could we as social scientists somehow have prevented that 
from occurring? Standing up and saying , on professional and 
scientific grounds, we feel that that evaluation should not 
occur in those terms because it's unfair, we know in advance 
the political consequences that are going to arise, and we do 
not view this as a scientific enterprise. And yet, again, this 
study was undertaken, but we were not organized enough or in 
agreement enou^i to take a stand that would have had political 
power in that regard. So, in a very real sense, we are pawns 
of the political process just because we have no control over 
the lose of our evaluation reports or in the interpretation of 
them. 

Should you have? 

Well, that's another question. It depends on whether you 
like the results or not. If they're to be misused, then you 
stand \jp and say, "You are misinterpreting the data." They 
don't believe you or they don't take your suggestions into 
account. Sometimes, the political decisions are made before 
scientists are even aware of the data; the Westinghouse report 
was leaked politically prior to any opportunity for scientific 
review and evaluation. It affected the political climate negatively 
before there was any feedback available. 

Ihis is the dilemma we're in; that's why I asked my initial 
question. Ultimately you are asking what criterion you u s e 
to va l i date research. You can use the qualitative approach, 

I presume; you gave an excellent example of people establishing 
an informal educational system to teach and prepare for blindness 
which really wasn't needed. I think historically you can denote 
statisticians who have cited figure after figure, for exanple, 
justifying the need for more dollars to svpport welfare when 
we really ought to be trying to determine what is causing welfare. 
You knew, this kind of a thing. So now we are back to the 
affect or the value domain, and we can evaluate that in terns of 
a group of specialists, scientists, or professionals or we can talk 
about a democratic process which operates on the majority principle. 
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We ' d like to have a system where the democratic process 
may be applied in an informed way. Let's go back to the 
Westinghouse study, for exanple. When that was undertaken, 
it was undertaken as an overall globcil surtmative evaluation of 
the effectiveness of Headstart at a time when that was not the 
political issue at all. That is, the evaluation wasn't pointed 
toward the issue at hand. The political decisions about Headstart 
essentially had been made. It wasn't any question about whether 
it was any good or not; that was like asking whether parks are 
any good. Hew do you evaluate whether parks are any good? 
Headstart, for better or worse, for good or ill, was a 
remarkable political experiment in the sense that it converted 
a clientele into a citizenry. Because of the political pewer 
of that citizenry, Headstart was not going to be terminated. 

The inportant question was not whether Headstart was any good 
but, rather, what are the aspects of the preschool program that 
produced differential results. What can wre do in preschool 
programs that will foster growth on cognitive, personal-social, 
and affective dimensions? The Westinghouse study didn't really 
bear cn that issue. It kind of nuddied the waters, in a way. 

Let me try to present the problem in a somewhat different 
way. I am really concerned with the role of social science 
in the political process and want to point out and emphasize 
the fact that educational evaluation is in the political process 
whether we like it or not. Being pessimistic considering the 
present political climate, it is more likely that evaluation 
results will be misused than used properly on the national scene 
or on the state level. A friend of mine, a sociologist, posed 
the problem in the following way. He said that whenever there 
is a complicated enterprise in which many parties are involved 
and the enterprise fails, the finger of blame is always pointed 
toward the weakest party. For many many years that finger of 
blame was pointed toward the child in the educational enterprise. 
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He failed. If only he worked harder, if only he tried more, 
or, perhaps, if only he came from better stock, but it was 
essentially his fault ... he failed. Then for a variety of 
reasons the finger subtly changed and moved away from the child 
and pointed toward the family. The family wasn't providing the 
appropriate learning supports, the appropriate kind of motivations, 
and perhaps, the appropriate genes. It didn't stay at the 
family very long because parents don't like to be pointed at 
in that regard, and there were other good reasons why the finger 
moved on. Rather it moved on very clearly to the school. 

The school is new the culpable party. The enterprise is interpreted 
as failing, and the blame is being put on the schools. I sense 
new that the finger is moving again very subtly, and it is moving 
directly teward social science. That is, the schools, the 
families, the children, and the politicians are saying: 

"Okay , the enterprise is failing, schools are not doing a good 
job, and we admit it. Families, for whatever reasons, are not 
doing a good jcb, and we admit it. The kids are still failing. 

The problem is to teach the kids, not to fail them. Tell us hew 
to do it better." The finger is pointing at social scientists, 
and I think it is not so much a sign of culpability as it is 
a sign of a weakness. It is pointing at us because we are the 
weakest party in this enterprise right now. How can we beoone 
stronger? Hew can we organize or engage in educational research, 
development , and evaluation in a way that we will have some inpact 
and power? That's a question I don't knew how to answer. 

We have been collecting reams of data, particularly in 
Title I, on the oontext of the educational enterprise, yet we 
have not been able to relate it to what few out cone measures we 
have. I think what you are talking about in terms of oontext 
is inportant but I'd like to hear more, especially for the developers. 
Would you care to elaborate? 
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I'm not sure that I understand the question; are you saying 
that you have been unable to relate your inputs to outcomes? 

We collect data on context, socioeconomic status of the 
youngster and all the rest. We also collect seme outcome measures. 
In Title I, he's disadvantaged in the first place and if we find 
that he isn't doing any better, we might be inclined to say, 

"Well, what can you expect," and the like. Can't we do better 
than that in terms of relating outcomes to context, and then by 
trying to do something about the context? 

I don't have any answer, I'm sorry. I don't know what 
strategy it would be. I feel that I could find some answers 
as a kind of qualitative observer as well as a quantitative 
researcher using a variety of techniques, not just experimental 
design but also elaborate questionnaire and statistical controls, 
depth interviews, etc. That is, by studying that complicated 
program with every kind of technique that is at our oonn and as 
social scientists and somehow ooming up with results that this 
variable had an effect, that a second variable did not have an 
effect, that a third variable seemed to be a hindrance, and so 
forth. What we need is a much more formal kind of strategy 
or scheme for this, to produce the best evaluation possible 
right now. 

You have very little to offer the curriculum developer, 
it seems to me. He still is at a lost as to what to do about 
change. 

You can look at differential effectiveness of change within 
context although there are seme arguments about this. There is 
an approach called "educational performance indicators" that 
operates at the program or school level and that takes into account 
the prediction of outcome by contextual and hard-to-change 
characteristics in the situation, like socioeconomic status. 

You find that you can get a nice regression line with schools 
varying around the line. In general, schools located in hi^i 
socioeconomic ccmmunities (with better resources and other things) 
produce better outcome results than schools located in low socio- 
e anomic aaninunities . To evaluate the differential effectiveness 
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within contextual level, you would ask, "Are there schools at 

the lew end, admittedly not producing very big outcomes, that 

are producing better outcomes than you would predict given 

the resources they had available and gi\ren the input characteristics 

of the children?" You might find a school that is doing a very 

jcb in the sense of being far above the regression line 
down at its lew end whereas another school at the upper level 
(given all the good resources and the "desirable" student input 
characteristics) might be far below the regression line. If 
we are concerned with evaluating schools , we ni^it say this quite 
rich school, using rich in a very glcbal integrated summary 
sense, is not doing as good a job as this poor school because 
one is above the regression line at the low end and the other is 
be lew the regression line at the upper end. Then you could ask 
what are the correlates of the deviation from that regression 
line, and this would give you a handle on procedures that co uld 
be introduced into other schools to improve effectiveness within 
their own particular context. 

This performance indicator approach has been criticized 
extensively by people who are concerned about the poverty community 
and ethnic minority groups by saying that we might become smugly 
satisfied with fixing it so that all the schools down at the low 
end go up above the regression line although they are still very 
lew, in an absolute sense, on outcome measures. They say, 

"That is not enough . Ultimately we don ' t care whether we ' re 
above the regression line or not; we care about the absolute 
value of the outcome . We want the regression line changed and 
we would like to see it flat and very high." 

What you just said seems to me to be a very important 
ingredient in this consortium concept. That is, if we can find 
evaluators who can supply what you're talking about, then the 
evaluators are going to be able to supply information for so-called 
developers who then can do something rather than just operating 
in a vacuum. I know that the state of the art is pretty shallow 
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at this point, but it is the part that I think we reed to 
concentrate on, otherwise we have nothing for the de\eloper 
to do with what the evaluator is supplying. 

One of the problems is the pluralistic nature of the 
clientele that we are dealing with, but as long as we can agree 
on certain outcomes as being very valuable and very positive 
then I think that strategy is a good one. But there are large 
segments of the educational community that are saying that 
they do not agree with the outcomes. How do we respond to 
that? Do we say that it is okay, that in a pluralistic society 
any sub-group within the larger culture has a right to determine 
its own destiny, its own values, and its own goals ... even 
though those goals might be counter-productive in the sense 
that they are not supportive of ultimate survival in the larger 
society. 

Well, one thing that you can do, of course, is try to 
"present them with the facts," to use that trite expression. 

That is, you can at least try to shew what outcomes eventuate 
from particular contexts. This includes both intended and, 
to the extent that you have the wit to anticipate them, the 
unintended outcomes resulting from a particular educational 
procedure. Hie assumption is made that the procedure has some 
element of reproduc ib ility in that the evaluator can say, 

"If you do this, you will get that result." Such information 

is useful even if you can't fully agree on how desirable it 

is to achieve this outcome or that outcome. I think that I would agree 

fully with what I understood you to be saying. This use of 

evaluative data is of great value ; at least you know what is 

happening as a result of what you're doing, and you are, at 

least potentially, in a position to do better. 

Seemingly implicit in the discussion that preceded was 
the notion that the relevant variables, and the measures that 
define the dimensions of these variables, are known. As a 
developer with seme playing around with evaluation, I have a 
very different feeling about that. 
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On the contrary, ny remarks certainly assumed that that 
was the problem. If we knew the nature of the variables that 
were operating, the dimensions of the problem, then development 
follows relatively routinely from that understanding. 

Except that in your model, there seemed to be one thing 
missing that is quite critical . That element is evaluation of 
the outcome of the development grotp in terms of the design 
which was handed to them. It seems to me that in a lot of 
cases, both in development of measurement devices and also 
development of curriculum, the specifications are sometimes 
quite well done, but the carrying-out of those specifications 
by the development group falls very short. Still, no one goes 
back to systematically examine the process of development. 

I was assuming that development would always be ev aluat ed 
and redirected appropriately . Again, let's take the development 
of tests as an exaxqple. Elaborate machinery has been developed 
that essentially asks, in terms of empirical evidenae , do the 
test items that were developed to meet the specifications in 
fact meet the specifications? 

There's a heck of a lot of judgment that enters in rather 
than just enpirical evidence. 

Well, in the development of a particular test even with 
highly-defined specifications, you typically develop twice as 
many or three times as many items as you ultimately plan to 
use; you threw away items that are found wanting. 

Wanting in terms of what criterion, Sam? 

Wanting in this sense; let's take a simple example. 

Assume that you have a specification grid that indicates that 
you'd like to have an achievement test that would cover a 
subject matter area, say the typical curriculum in American 
History. But there are certain processes that you also want 
to tap, and you would like to have coverage of all the processes 
over all the content topics . You might decide to write 10 
items for each cell . Well , that ' s a hypothesis . . . you are 




32 



32 



ROBINSON: 



MESSICK: 



O 




hypothesizing that the items that you wrote indeed cluster 
together empirically. Next you get information about the 
interoorrelations of the items, item-total correlations, 
and things of that type that enable you to evaluate whether 
or not you've done a good job in covering that domain. At 
the same time (and this I think is very important) , you also 
can get information that might indicate that the domain as 
specified was incorrect. Frequently in looking back from the 
eitpirical data to the original specifications, we notice that 
what was believed to be a single cell is really two dimensions . 

You then can go back to the specifiers, and they might agree 
that an important element of the situation was neglected. 

Therefore , they change the specifications on the basis of 
empirical results. If this were done routinely and continually, 
it would lead to a theory of achievement in each subject matter 
domain. I am not suggesting that we do that, however. Certainly 
we at EES don't nor does anybody else; as a result, there is 
no theory of achievement in any subject matter domain. 

When we go beyond that rather simple kind of problem, 
it gets even worse because of the frequently-occurring hiatus 
between a cluster of items, even supported by empirical data, 
and the specifications. Judgment seems to enter in the 
resolution of such discrepancies. For example, I can say that 
the items I have developed really get at this idea, and someone 
else can reply, "the heck they do." There's no resolving criterion 
except the way each of us feels. 

At one simpler level below that, the question becomes 
(even with judgment) , whether or not all 10 items developed for 
a certain oell get at it to the same degree. Empirically, we 
can decide on each item. Whether it's an inportent thing to get 
at is still a judgment but we can say that you have written 
only 2 items that are any good, not 10. 
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It would seem, then, that a major part of the need in 
evaluation and devalopment is for a tremendous input into 
devising and searching for variables, new variables, and an 
input directed toward perfecting measurement devices . 

You're right. I think that is a very critical issue. 

The really difficult thing we have to face is understanding 
the dimensions that are operating in the problem area. What 
are the variables that are irrportant to assess? People consider 
that question to be in the research domain, and research is 
a frill these days although, as I noted, research is much cheaper 
to support than action programs. 

Also, in terms of funding, we operate in educational 
development on a very different scale than in most other areas - 
We really have very little money . I was present when soneone 
in Was h i n gton asked, "Hew come educational researchers can't 
tell us what to do?" (This happened to be in the early childhood 
area.) The question was: "How oome with all the research that 

has been done in child development and early education, you 
don't knew what to do? Why, that's terrible!" In the ensuing 
discussion, it was pointed out by a child psychologist that the 
questioner really didn't understand the dimensions of the funding 
problem. It was pointed out that the operating budget for 
Project Headstart during its first year (which was $300,000,000) 
would have paid for cill of the child development research all 
over the world for the past 40 years, including Jean Piaget's 
salary. New if you translate that into other units like aircraft 
carriers, you can see we are really talking about a pittance. 

Even in areas that are well-researched, there are important 
development problems. Take, for exanple, certain kinds of 
verbal aptitudes . We have 50 years of research behind us on 
the nature of the dimensions that are operating. Where you have 
prior research and evaluation, then the developmental problem 
became a really critical one. It involves how to get inpeccably- 
developed instruments that have optimal properties. If you're 
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working in an area, say s e 1 f -concept , where the dimensions 
are not well-defined, then the development problem must be 
intimately related with both the research problem and the 
evaluation problem. At the training level , you are not going 
to train people to be developers in these narrow areas, that is, 
to be developers in self -concept or developers in verbal aptitude. 
Since development is intimately and inextricably intertwined 
with problems of evaluation and research, the training of people 
to be developers must occur in those contexts. 

Sam, awhile ago we found or I found, that we were in agreement 
when I thought we were in disagreement. Now I think we are in 
disagreement when I thought we were in agreement. I agree with 
your sta t e m ent that in those areas where the necessary research 
has been done, then it's primarily a question of development. 

The only thing is, I don't think there is any area like that. 

Okay. 

I think that in any area we consider, we have to check 
whatever presuppositions we have from background research with 
the sort of "proof-of-the-pudding" evaluative data that we 
collect. 

The other point that I would like to neke is in relativistic 
terms. It doesn't seem to be a question of how much percentage- 
wise should go into development or evaluation, but rather that 
more should go into those enterprises that intimately combine 
development and ev a lu at ion so that we're not talkin g about one or 
the other, but rather the combination of the two into a single 
activity. 

Organizationally, this has raised some rather interesting 
questions in terms of objectivity . If you intertwine evaluation 
and development to such an extent that they are no longer 
separable, then presumably the same people in the organization 
are determining them both. 
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Very good point. I think that this again raises the 
question of two different kinds of evaluation, formative and 
sunmative. I personally feel that the most mileage is to 
k® out of formative evaluation in which the evaluative 

data feed right bade into an iterative deve lopnen t-irrproveiren t 
cycle. But also at seme point you do have to have, by sorre 
neans , a non— incestuous kind of evaluation in which someone 
from the outside acutes in and says, "okay, you feel that you 
have done your best using evaluative data to make changes. 

Now we are going to make an independent assessment of how well 
you have done . " 

I'm going to try another idea because I can conceive 
°f a situation where you develop an expertise to take the 
data accumulated by the evaluator, transfer it into layman's 
language, and, using the public media, create a different kind 
of evaluation mechanism. Maybe that is what we're lacking. 

Now I'm back to one of my initial statements. Have 
we gone so far astray in terms of our jargon and in terns of our 
data— gathering processes that we have somehow left the aninel 
(that's suppose to benefit from it all) by the wayside? I 
just raise that question. 

I take it that that's a rhetorical question. 

It may be, but it is a practical question. 

Maybe we left the animal that is supposed to decide who's 
to benefit from it by the wayside. 

But he can't decide unless he has seme of that data. 

We have to leave in a moment, but I would like to enphasize 
that I believe where Art ended up on this last issue was in 
support of my conclusion. He didn't particularly like the 
premise from which I arrived at that conclusion. 

Hie retort courteous to that is , "You keep off my premises . " 

One other point. I'm essentially a personality researcher 
and I knew that area fairly well. To <i> personality reseavch. 
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you have to have measures of the variables that you are trying 
to interrelate and understand. So you develop measures in terms 
of your best understanding, you evaluate the adequacy of those 
measures, you do further research which leads you to re-ccnoeptuali ze 
variables, you then redevelop the measures which leads 
you to re-evaluate them, and you go through an iterative cycle 
in this way. The original motivating foroe was research-oriented, 
that is, trying to understand the nature of the variables in 
the system. The process that you went through taps developnent, 
evaluation, and research, and it is iterative. If it's a continuous 
iterative process for the developer or the evaluator, then I 
don t see that it matters where you enter the iteration, as long 
as you replicate it several times. But if you enter the 
iteration, say, at the development stage, and you develop and 
that's all, then I think inevitably, regardless of the area, 
that product is going to be found wanting. 

I also don't believe that it's a good idea to enter and 
engage in the iteration only once, no matter where you start. 

That is, if you start with a presumed understanding of the 
research domain and say that research leads to development which 
leads to evaluation, and you find that the result is good and 
you bless it ... almost certainly it's inadequate. Research 
should lead to development to evaluation to research to developnent 
to eva lu a t ion ... we should recognize the iterative nature of 
the enterprise, and we should recognize that the skills and 
demands of the situation are intimately interrelated. We should 
reoognize this at all levels, including the initial graduate 
training Cor this effort. As a researcher, I seem to keep 
emphasizing that research is a critical part of this; I see 
others as part of the research effort. If I were an evaluator, 

I would probably see the others as part of the evaluation effort; 
and if a developer, I would see others as part of the development 
effort. They're really all part of the same repetitive process, 
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a cyclical process, a process which absolutely denands feedback. 
Any model that conceptualizes this domain as a linear one, 
of one thing leading to another, is absolutely incorrect. I 
will be very dogmatic on that point. Without feedback, we 
cannot proceed properly in this domain. 

I presume what I 'm saying is that the research function 
is a more delimited function than the evaluation function, 
and they move fron different premises. The research function 
to a large extent is imbedded in, although not exclusively, 
quantitative analysis. 

I would say research is imbedded in conceptual analysis. 

If you say conceptual analysis , then it makes everything else 
a part of it. I would start with research being the conceptual 
process, while the quantitative techniques of evaluation and 
development would be part of that process. Always, however, 
the critical aspect of this is the conceptual process. The 
decisions are going to be made conceptually in terms of values, 
so I would put primacy on the research side, but then again, 

I am a researcher. 

Sam, I think that's a good note on which to conclude. 




